Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Abstract The Spatial Data Lab (SDL) project is a collaborative initiative by the Center for Geographic Analysis at Harvard University, KNIME, Future Data Lab, China Data Institute, and George Mason University. Co-sponsored by the NSF IUCRC Spatiotemporal Innovation Center, SDL aims to advance applied research in spatiotemporal studies across various domains such as business, environment, health, mobility, and more. The project focuses on developing an open-source infrastructure for data linkage, analysis, and collaboration. Key objectives include building spatiotemporal data services, a reproducible, replicable, and expandable (RRE) platform, and workflow-driven data analysis tools to support research case studies. Additionally, SDL promotes spatiotemporal data science training, cross-party collaboration, and the creation of geospatial tools that foster inclusivity, transparency, and ethical practices. Guided by an academic advisory committee of world-renowned scholars, the project is laying the foundation for a more open, effective, and robust scientific enterprise.more » « lessFree, publicly-accessible full text available December 1, 2026
- 
            Free, publicly-accessible full text available May 1, 2026
- 
            Breathing in fine particulate matter of diameter less than 2.5 µm (PM2.5) greatly increases an individual’s risk of cardiovascular and respiratory diseases. As climate change progresses, extreme weather events, including wildfires, are expected to increase, exacerbating air pollution. However, models often struggle to capture extreme pollution events due to the rarity of high PM2.5 levels in training datasets. To address this, we implemented cluster-based undersampling and trained Transformer models to improve extreme event prediction using various cutoff thresholds (12.1 µg/m3 and 35.5 µg/m3) and partial sampling ratios (10/90, 20/80, 30/70, 40/60, 50/50). Our results demonstrate that the 35.5 µg/m3 threshold, paired with a 20/80 partial sampling ratio, achieved the best performance, with an RMSE of 2.080, MAE of 1.386, and R2 of 0.914, particularly excelling in forecasting high PM2.5 events. Overall, models trained on augmented data significantly outperformed those trained on original data, highlighting the importance of resampling techniques in improving air quality forecasting accuracy, especially for high-pollution scenarios. These findings provide critical insights into optimizing air quality forecasting models, enabling more reliable predictions of extreme pollution events. By advancing the ability to forecast high PM2.5 levels, this study contributes to the development of more informed public health and environmental policies to mitigate the impacts of air pollution, and advanced the technology for building better air quality digital twins.more » « lessFree, publicly-accessible full text available February 1, 2026
- 
            Data dashboards provide a means for sharing multiple data products at a glance and were ubiquitous during the COVID-19 pandemic. Data dashboards tracked global and country-specific statistics and provided cartographic visualizations of cases, deaths, vaccination rates and other metrics. We examined the role of geospatial data on COVID-19 dashboards in the form of maps, charts, and graphs. We organize our review of 193 COVID-19 dashboards by region and compare the accessibility and operationality of dashboards over time and the use of web maps and geospatial visualizations. We found that of the dashboards reviewed, only 17% included geospatial visualizations. We observe that many of the COVID-19 dashboards from our analysis are no longer accessible (66%) and consider the ephemeral nature of data and dashboards. We conclude that coordinated efforts and a call to action to ensure the standardization, storage, and maintenance of geospatial data for use on data dashboards and web maps are needed for long-term use, analyses, and monitoring to address current and future public health and other challenging issues.more » « lessFree, publicly-accessible full text available January 1, 2026
- 
            Free, publicly-accessible full text available January 1, 2026
- 
            Accurate air pollution monitoring is critical to understand and mitigate the impacts of air pollution on human health and ecosystems. Due to the limited number and geographical coverage of advanced, highly accurate sensors monitoring air pollutants, many low-cost and low-accuracy sensors have been deployed. Calibrating low-cost sensors is essential to fill the geographical gap in sensor coverage. We systematically examined how different machine learning (ML) models and open-source packages could help improve the accuracy of particulate matter (PM) 2.5 data collected by Purple Air sensors. Eleven ML models and five packages were examined. This systematic study found that both models and packages impacted accuracy, while the random training/testing split ratio (e.g., 80/20 vs. 70/30) had minimal impact (0.745% difference for R2). Long Short-Term Memory (LSTM) models trained in RStudio and TensorFlow excelled, with high R2 scores of 0.856 and 0.857 and low Root Mean Squared Errors (RMSEs) of 4.25 µg/m3 and 4.26 µg/m3, respectively. However, LSTM models may be too slow (1.5 h) or computation-intensive for applications with fast response requirements. Tree-boosted models including XGBoost (0.7612, 5.377 µg/m3) in RStudio and Random Forest (RF) (0.7632, 5.366 µg/m3) in TensorFlow offered good performance with shorter training times (<1 min) and may be suitable for such applications. These findings suggest that AI/ML models, particularly LSTM models, can effectively calibrate low-cost sensors to produce precise, localized air quality data. This research is among the most comprehensive studies on AI/ML for air pollutant calibration. We also discussed limitations, applicability to other sensors, and the explanations for good model performances. This research can be adapted to enhance air quality monitoring for public health risk assessments, support broader environmental health initiatives, and inform policy decisions.more » « lessFree, publicly-accessible full text available February 1, 2026
- 
            With recent advancements, large language models (LLMs) such as ChatGPT and Bard have shown the potential to disrupt many industries, from customer service to healthcare. Traditionally, humans interact with geospatial data through software (e.g., ArcGIS 10.3) and programming languages (e.g., Python). As a pioneer study, we explore the possibility of using an LLM as an interface to interact with geospatial datasets through natural language. To achieve this, we also propose a framework to (1) train an LLM to understand the datasets, (2) generate geospatial SQL queries based on a natural language question, (3) send the SQL query to the backend database, (4) parse the database response back to human language. As a proof of concept, a case study was conducted on real-world data to evaluate its performance on various queries. The results show that LLMs can be accurate in generating SQL code for most cases, including spatial joins, although there is still room for improvement. As all geospatial data can be stored in a spatial database, we hope that this framework can serve as a proxy to improve the efficiency of spatial data analyses and unlock the possibility of automated geospatial analytics.more » « less
- 
            This paper analyzes the spatiotemporal patterns of nitrogen dioxide (NO2) tropospheric vertical column densities (TVCDs) before and during the second wave of COVID-19 in India. The results indicate that the NO2 columns increase significantly in the reopening period before the second wave (Mar. 1 ∼ Apr. 20) in 2021, which exceed the levels of the same period in 2019. The relative difference from the mean of 2010–2019 is 18.76% higher in 2021 than that of 2019, during the reopening. The paper identifies Odisha, Madhya Pradesh, Chhattisgarh, Jharkhand and West Bengal as the five states with the largest increases in relative difference from 2019 to 2021, which are 33.81%, 29.83%, 23.86%, 30.01%, and 25.48% respectively. As illustrated by trends in the indices of industrial production (IIP), these unexpected increases in tropospheric NO2 can be attributed to reopening as well as elevated production across various sectors including electricity, manufacturing and mining. Analysis of NO2 TVCD levels alongside IIPs indicate a marked increase in industrial activity during the reopening period in 2021 than in the same time period in 2019. After the beginning of the second wave of COVID-19 (Apr. 21 ∼ Jun. 21), India re-implemented lockdown policies to mitigate the spread of the pandemic. During this period, the relative difference of total NO2 columns declined in India as well as in most individual study regions, when compared to 2019, due to the pandemic mitigation policies. The relative declines are as follows: 6.43% for the whole country and 14.25%, 22.88%, 4.57% and 7.89% for Odisha, Madhya Pradesh, Chhattisgarh and Jharkhan, respectively, which contain large industrial clusters. The change in relative difference in West Bengal from 2019 to 2021 is not significant during the re-lockdown period with a 0.04% increase. As with the first wave, these decreases in NO2 TVCD mainly due to the mitigation policies during the second wave.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
